Idefics2 is an open-source multimodal model capable of accepting arbitrary sequences of image and text inputs to generate text outputs. It shows significant improvements in OCR, document understanding, and visual reasoning.
Image-to-Text
Transformers English